Evaluation: Add question id by AkhileshNegi · Pull Request #553 · ProjectTech4DevAI/kaapi-backend

AkhileshNegi · 2026-01-22T03:15:43Z

Summary

Target issue is #544

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

New Features
- Question ID tracking added across evaluation runs, dataset uploads, trace creation, and result items to group duplicates and maintain consistent per-question identifiers.
Tests
- Added tests covering question ID propagation, sequencing, duplicate grouping, and missing-ID cases.

coderabbitai · 2026-01-22T03:15:50Z

📝 Walkthrough

Walkthrough

Propagate a per-question question_id through evaluation parsing, Langfuse dataset item metadata, trace creation, and trace/score retrieval. IDs are generated sequentially (1-based) per original input and applied consistently to duplicates; parsing and uploads were extended to accept and carry question_id.

Changes

Cohort / File(s)	Summary
Langfuse Integration Core `backend/app/crud/evaluations/langfuse.py`	Extended `upload_item` signature to include `question_id` and attach it to item metadata and trace.metadata; dataset upload assigns 1-based `question_id` per original item and propagates it to duplicates; `fetch_trace_scores_from_langfuse` now initializes and reads `question_id` from trace metadata.
Evaluation Processing `backend/app/crud/evaluations/processing.py`	`parse_evaluation_output` now extracts `question_id` from `dataset_item.metadata.question_id` and includes it in each parsed result.
Tests `backend/app/tests/crud/evaluations/test_langfuse.py`, `backend/app/tests/crud/evaluations/test_processing.py`	Added tests validating `question_id` propagation (present/absent), sequential 1-based assignment across items, consistent IDs for duplicates, and parsing behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Evaluation: Fix score format #549 — touches the same evaluation pipeline and alters trace/score metadata formats; likely related to question_id/metadata shape changes.

Suggested reviewers

kartpop
nishika26
Prajna1999

Poem

🐰 I hop through traces with a tiny tag,
A number for each question in my bag,
Duplicates march along in tune,
Counts start at one beneath the moon,
Metadata shines — a carrot-sized flag. 🥕

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Evaluation: Add question id' clearly and concisely summarizes the primary change: adding question_id tracking to the evaluation system, which is consistent with the comprehensive changes across multiple files in the changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 93.33% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-01-22T03:20:40Z

Codecov Report

❌ Patch coverage is 98.87640% with 1 line in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
backend/app/crud/evaluations/langfuse.py	91.66%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/crud/evaluations/langfuse.py (1)
254-296: Fix question_id type hint (int vs str).

question_id is assigned as an integer, but the helper signature declares str, which will trip type checkers and contradicts tests.
🐛 Suggested fix
-    def upload_item(item: dict[str, str], duplicate_num: int, question_id: str) -> bool:
+    def upload_item(item: dict[str, str], duplicate_num: int, question_id: int) -> bool:

🧹 Nitpick comments (3)

backend/app/tests/crud/evaluations/test_processing.py (1)
70-98: Add a type annotation to the new test method signature.

Per the project requirement, parameters should be typed; this applies to test methods too.
♻️ Suggested update
-from typing import Any
+from typing import Any, Self
@@
-    def test_parse_evaluation_output_without_question_id(self) -> None:
+    def test_parse_evaluation_output_without_question_id(self: Self) -> None:
As per coding guidelines, please add type hints to all function parameters.
backend/app/tests/crud/evaluations/test_langfuse.py (1)
499-571: Type annotate valid_items in the new tests.

The new test methods introduce untyped parameters.
♻️ Suggested update
-    def test_upload_dataset_to_langfuse_question_id_in_metadata(self, valid_items):
+    def test_upload_dataset_to_langfuse_question_id_in_metadata(
+        self, valid_items: list[dict[str, str]]
+    ) -> None:
@@
-    def test_upload_dataset_to_langfuse_same_question_id_for_duplicates(
-        self, valid_items
-    ):
+    def test_upload_dataset_to_langfuse_same_question_id_for_duplicates(
+        self, valid_items: list[dict[str, str]]
+    ) -> None:
As per coding guidelines, please add type hints to all function parameters.
backend/app/crud/evaluations/langfuse.py (1)
424-447: Keep question_id type consistent in fetched trace data.

Defaulting to "" introduces a str/int mix; prefer None when absent to align with upstream.
♻️ Suggested update
-                    "question_id": "",
+                    "question_id": None,
@@
-                    trace_data["question_id"] = trace.metadata.get("question_id", "")
+                    trace_data["question_id"] = trace.metadata.get("question_id")

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

backend/app/crud/evaluations/langfuse.py (1)

256-297: Add question_id field to TraceData TypedDict and normalize type to int.

The question_id field is missing from the TraceData TypedDict definition in backend/app/crud/evaluations/score.py (lines 20–27), yet the code attempts to set it at lines 421–427 and 449. Additionally, the function parameter is annotated as str (line 256) but assigned as int from enumerate (line 287), then defaulted to "" string (line 426) when reading.

Fix:

Add question_id: int to the TraceData TypedDict (score.py)

Change the parameter annotation from question_id: str to question_id: int (langfuse.py, line 256)

Initialize trace_data["question_id"] = 0 instead of "" (langfuse.py, line 426)

Convert the metadata value to int on read: trace_data["question_id"] = int(trace.metadata.get("question_id", 0)) if trace.metadata.get("question_id") else 0 (langfuse.py, line 449)

first stab at question id

6db1c86

AkhileshNegi linked an issue Jan 22, 2026 that may be closed by this pull request

Evaluation: Parent ID to run traces #544

Closed

AkhileshNegi self-assigned this Jan 22, 2026

AkhileshNegi added the enhancement New feature or request label Jan 22, 2026

AkhileshNegi marked this pull request as ready for review January 22, 2026 03:16

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

Merge branch 'main' into enhancement/evaluation-add-question-id

1b2b358

coderabbitai bot reviewed Jan 22, 2026

View reviewed changes

vprashrex mentioned this pull request Jan 27, 2026

Evaluation: add export_format query param for grouped trace export #562

Merged

2 tasks

Merge branch 'main' into enhancement/evaluation-add-question-id

bef791e

AkhileshNegi requested a review from vprashrex January 28, 2026 07:22

vprashrex approved these changes Jan 28, 2026

View reviewed changes

AkhileshNegi merged commit 44b70ac into main Jan 28, 2026
3 checks passed

AkhileshNegi deleted the enhancement/evaluation-add-question-id branch January 28, 2026 07:27

coderabbitai bot mentioned this pull request Jan 29, 2026

Langfuse: question_id not in traces #567

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation: Add question id#553

Evaluation: Add question id#553
AkhileshNegi merged 3 commits intomainfrom
enhancement/evaluation-add-question-id

AkhileshNegi commented Jan 22, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 22, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

codecov bot commented Jan 22, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AkhileshNegi commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Notes

Uh oh!

coderabbitai bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

codecov bot commented Jan 22, 2026

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AkhileshNegi commented Jan 22, 2026 •

edited

Loading

coderabbitai bot commented Jan 22, 2026 •

edited

Loading